AITopics | local switching cost

Collaborating Authors

local switching cost

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provably Efficient Q-Learning with Low Switching Cost

Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang

Neural Information Processing SystemsFeb-12-2026, 01:48:17 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, local switching cost, switching cost, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Neural Information Processing SystemsDec-25-2025, 08:07:32 GMT

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of \emph{local switching cost}. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for $H$-step episodic MDP that achieves sublinear regret whose local switching cost in $K$ episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting \citep{guo2015concurrent}, which yields nontrivial results that improve upon prior work in certain aspects.

algorithm, name change, provably efficient q-learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang

Neural Information Processing SystemsOct-2-2025, 15:57:46 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.46)

Industry: Health & Medicine (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ad71c82b22f4f65b9398f76d8be4c615-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 18:42:24 GMT

inequality, probability, sn 0, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Reviews: Provably Efficient Q-Learning with Low Switching Cost

Neural Information Processing SystemsJan-23-2025, 09:57:25 GMT

They also present (two flavours of) a Q-learning algorithm that achieve the regret matching the previous work however with the added benefit of having lower local switching cost.

algorithm, provably efficient q-learning, switching cost, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Neural Information Processing SystemsOct-9-2024, 23:07:46 GMT

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of \emph{local switching cost}. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H -step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is O(H 3SA\log K), and we provide a lower bound of \Omega(HSA) on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting \citep{guo2015concurrent}, which yields nontrivial results that improve upon prior work in certain aspects.

algorithm, local switching cost, provably efficient q-learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Gao, Minbo, Xie, Tianle, Du, Simon S., Yang, Lin F.

arXiv.org Machine LearningJan-2-2021

Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an $\widetilde{O}\left(\sqrt{d^3H^4K}\right)$ regret bound with a near-optimal $O\left(d H\log K\right)$ global switching cost where $d$ is the feature dimension, $H$ is the planning horizon and $K$ is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [Jin et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an $\Omega\left(dH/\log d\right)$ global switching cost lower bound for any no-regret algorithm.

algorithm, global switching cost, switching cost, (12 more...)

arXiv.org Machine Learning

2101.00494

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

Zhang, Zihan, Zhou, Yuan, Ji, Xiangyang

arXiv.org Machine LearningJun-6-2020

We study the reinforcement learning problem in the setting of finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and episode length $H$. We propose a model-free algorithm UCB-Advantage and prove that it achieves $\tilde{O}(\sqrt{H^2SAT})$ regret where $T = KH$ and $K$ is the number of episodes to play. Our regret bound improves upon the results of [Jin et al., 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-Advantage achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al., 2019].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2004.10019

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Bai, Yu, Xie, Tengyang, Jiang, Nan, Wang, Yu-Xiang

Neural Information Processing SystemsMar-18-2020, 23:47:40 GMT

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of \emph{local switching cost}. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for $H$-step episodic MDP that achieves sublinear regret whose local switching cost in $K$ episodes is $O(H 3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting \citep{guo2015concurrent}, which yields nontrivial results that improve upon prior work in certain aspects. Papers published at the Neural Information Processing Systems Conference.

algorithm, local switching cost, provably efficient q-learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Efficient Q-Learning with Low Switching Cost

Bai, Yu, Xie, Tengyang, Jiang, Nan, Wang, Yu-Xiang

arXiv.org Artificial IntelligenceMay-30-2019

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting, which yields nontrivial results that improve upon prior work in certain aspects.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1905.12849

Country:

North America > United States > Illinois (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback